feat(xsd-ingest): broaden default ingest to full Transitional bundle#6
Merged
Conversation
…l bundle The default ingest only walked wml.xsd's import closure (12 of 26 XSDs). SML, PML, VML, and several standalone shared schemas - including shared-customXmlDataProperties.xsd (the home of ds:datastoreItem) - never reached the schema graph, so structural tools failed on anything outside WordprocessingML. Default entrypoints become an explicit list of 9 roots whose union closure covers all 26 files in data/xsd-cache/ecma-376-transitional/. Explicit over glob so a stray file in the cache directory can't quietly land in production ingest. No code changes to vocabulary.ts: every targetNamespace the broader set declares is already registered. No spec-prose vs XSD URI alias is added - that's a separate concern. Adds a smoke test that ingests the full closure and asserts (a) 26 documents parsed, (b) ds:datastoreItem resolves under the customXml namespace, (c) SML / PML top-level elements land in their vocabularies, and (d) no unresolved child / group / attrGroup edges. Floors are set above today's WML-only baseline so a regression that drops a vocabulary fails the test. This PR ships code only; the production DB is not mutated as part of merging. See the PR body for the post-merge runbook (run xsd:ingest, expected deltas, smoke checks).
Two P3 findings from PR review: - scripts/ingest-xsd/README.md still claimed the default ingest walked wml.xsd's 12-document closure. Updated to describe the 9-root / 26-XSD default and how to narrow it back when needed. - tests/ingest-xsd/ingest.test.ts gated the new full-bundle smoke test on just wml.xsd. A dev with a partial cache (e.g. someone who fetched WML only for hand-testing) would have the test attempt to readFile a missing root and fail. Now gates on all 9 default entrypoints; the existing WML-only smoke test keeps its narrow wml.xsd-only gate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Walks all 26 Transitional XSDs instead of just
wml.xsd's import closure (12 of 26). SML, PML, VML, and several standalone shared schemas were never reaching the schema graph, so structural lookups failed on everything outside WordprocessingML. The motivating case wasds:datastoreItem, which lives inshared-customXmlDataProperties.xsdand was unreachable.vocabulary.tschange: every targetNamespace the broader set declares is already registered. No spec-prose / XSD URI alias added — that's a separate concern.This ships code only. The production DB is not touched by merging. Run the ingest post-merge.
Runbook (post-merge, against production DB)
Expected stats from the new default closure (captured locally against a clean DB):
Smoke checks after ingest:
Verified locally: full test suite 59 pass / 0 fail / 0 skip (smoke tests previously skipped via
skipIf(!realCacheReady)now run with the cache populated).Review: confirm the explicit entrypoint list captures the bundle you'd expect; flag if any namespace should be excluded from the default profile. Ignore: vocabulary.ts (unchanged) and ooxml-tools.ts (unchanged from PR 1).